Inventi Impact: Image & Video Processing

Articles

Inventi:eiv/115069/26

A Hybrid ResNet50-Vision Transformer Model with an Attention Mechanism for Aerial Image Classification

01-Jul-2026 Research 2026 : July-September

Amr Aboghanem, Mohamed Abd Elfattah, Hanan M Amer, Abeer Tawkol Khalil

Aerial image classification is considered an open challenge due to its properties and the presence of various complex images. Given the complexity and variation in aerial images, this paper proposes two hybrid models for classification. The first hybrid model combines features extracted from ResNet-50 and the Vision Transformer (ViT), followed by the application of multi-head attention (MHA) to detect the most informative features. The second hybrid model also extracts features from ResNet-50 and ViT, then applies cross-attention. Both hybrid models are assessed using the benchmark Sikkim Aerial Images Dataset for Object Detection (SAIOD). The efficacy of the two hybrid models is assessed using the well-established performance metrics, including precision, recall, F1-score, and the ROC curve. The results indicate that the first model, which employs MHA, achieves superior performance with an accuracy of 95.80%. Both models outperform the best existing methods, achieving accuracies of 95.80% and 95.52%, respectively.

How to Cite this Article
Attribution/ CC Compliant Citation: Aboghanem, A., Abd Elfattah, M., M. Amer, H. et al. A hybrid ResNet50- vision transformer model with an attention mechanism for aerial image classification. Sci Rep 16, 5940 (2026). https://doi.org/10.1038/s41598-026-36492-4 http://creativecommons.org/licenses/by/4.0/ Some formatting elements, header, footer, logos, dates and pagination were modified while adapting this article.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Image & Video Processing

Articles

Inventi:eiv/115069/26

A Hybrid ResNet50-Vision Transformer Model with an Attention Mechanism for Aerial Image Classification

How to Cite this Article

Links

Contact Us